Saliency Map for GAT #435

sktzwhj · 2019-06-28T04:04:06Z

This PR adds the saliency map for GAT model. @adocherty had a look at the previous implementation and this one adapts the implementation to the new generator APIs.

-Huijun

review-notebook-app · 2019-06-28T04:04:12Z

Check out this pull request on ReviewNB: https://app.reviewnb.com/stellargraph/stellargraph/pull/435

You'll be able to see visual diffs and write comments on notebook cells. Powered by ReviewNB.

codeclimate · 2019-06-28T04:04:55Z

stellargraph/utils/saliency_maps_gat/saliency.py

+                This is typically the logit or softmax output.
+    """
+
+    def __init__(self, model, generator):


Function __init__ has a Cognitive Complexity of 8 (exceeds 5 allowed). Consider refactoring.

codeclimate · 2019-06-28T04:04:56Z

stellargraph/utils/saliency_maps_gat/integrated_gradients.py

+            )
+        return np.squeeze(total_gradients * X_diff, 0)
+
+    def get_integrated_link_masks(


Function get_integrated_link_masks has a Cognitive Complexity of 6 (exceeds 5 allowed). Consider refactoring.

codeclimate · 2019-06-28T04:04:56Z

stellargraph/utils/saliency_maps_gat/saliency.py

+            A_val = self.A
+        # Execute the function to compute the gradient
+        self.set_ig_values(1.0, 0.0)
+        if self.is_sparse and not sp.issparse(A_val):


Identical blocks of code found in 2 locations. Consider refactoring.

codeclimate · 2019-06-28T04:04:56Z

stellargraph/utils/saliency_maps_gat/saliency.py

+            A_val = self.A
+        # Execute the function to compute the gradient
+        self.set_ig_values(alpha, non_exist_edge)
+        if self.is_sparse and not sp.issparse(A_val):


Identical blocks of code found in 2 locations. Consider refactoring.

codeclimate · 2019-06-28T04:04:56Z

stellargraph/utils/saliency_maps_gat/saliency.py

+                This is typically the logit or softmax output.
+    """
+
+    def __init__(self, model, generator):


Cyclomatic complexity is too high in method init. (6)

codeclimate · 2019-06-28T04:04:58Z

Code Climate has analyzed commit 1dbe778 and detected 8 issues on this pull request.

Here's the issue category breakdown:

Category	Count
Complexity	5
Duplication	2
Security	1

View more on Code Climate.

adocherty

Notebook

stellargraph_dev/demos/interpretability/gat/node-link-importance-demo-gat.ipynb

Jupyter notebook does not have title.
When you introduce importances and saliency maps you don’t seem to describe the assumptions of the functions. I believe the integrated gradient methods assume that the features are all binary. We should state this assumption clearly in the description. Also indicate what we should do if the features are not binary.
Sanity checks and others are best moved from notebooks to unit tests:
- delta & non_exist_edge variable check
- serialisation check
- ego-graph integrate_link_mask check
- masked_array check
Let’s do the same here as we discussed for GCN - I think the functions should all take a node ID not an index so we don’t confuse the different indices for graph vs pandas features.
How about changing the name of get_integrated_link_masks to get_link_importance to match the get_node_importance function.
As you have used sorted_indices[::-1] in the preceding cell, in the line:

print('Top {} most important links by integrated gradients are {}'.format(topk, integrated_link_importance_rank[-topk:]))

Shouldn’t integrated_link_importance_rank[-topk:] be integrated_link_importance_rank[:topk]?

The nodes in G_ego are by IDs, therefore I think we should use target_nid not target_idx here:

  if nid == target_nid:
    continue

In the visualisation code and elsewhere there are many instances of list(G.nodes()).index(·) that can be replaced with graph_nodes.index(·)

layer/graph_attention.py

I think we need to subtract the row maximum from dense before the exponential to avoid floating point errors in the exp function:

W = (
    (1 - self.non_exist_edge) * self.delta * A
    + self.non_exist_edge
    * (
        A
        + self.delta * (K.ones(shape=[N, N], dtype=*”float”*) - A)
        + K.eye(N)
    )
) * K.exp(dense - K.max(dense, axis=1, keepdims=True))
dense = W / K.sum(W, axis=1, keepdims=True)

This means the current tests fail, but I don’t believe this is bad – the results from this are different from the implementation without the subtraction. The results in the notebook are the same with this normalisation.

There is, as you point out, an issue with the number of non-zero elements in the link importance calculation; however, I think this is just an issue of floating point accuracy, and counting elements above a small threshold works fine: i.e. using the following

print("Number of non-zero elements in integrate_link_importance: {}".format(np.sum(np.abs(integrate_link_importance) > 1e-8)))

gives:

Number of edges in the ego graph: 210
Number of non-zero elements in integrate_link_importance: 210

I would like to add a unit test to the tests/layer/test_graph_attention.py file that checks that the results from the implementation with saliency_map_support=False are the same as those for saliency_map_support=True – such as the test_apply_average_with_neighbours method. Would you like to add this test?

utils/saliency_maps_gat

I think we should move these files to utils/saliency_maps and name them integrated_gradients_gat.py and saliency_gat.py . Additionally rename the IntegratedGradients class to IntegratedGradientsGAT and GradientSaliency to GradientSaliencyGAT. This way we can import all saliency objects to the same namespace in utils/saliency_maps/__init__.py.
As in the comments on the notebook above, let’s use the node IDs instead of index in all functions.
In get_integrated_node_masks the features are taken from zero to one.
- This seems to assume that the features are all binary. What happens if they are not? We should state this assumption clearly in the class documentation.
- This seems to only consider the importance of features that are one, as any features that are zero will be zero for all steps. I would have guessed that there would also be a function that takes those features from 1 to 0, you talked about this in the paper but I forget what you said now!
There should be a description of what each method does and the assumptions made.
In the argument list try not to put multiple variables on the same line, rather describe each variable separately on its own line.

sktzwhj · 2019-07-31T06:39:47Z

Notebook

stellargraph_dev/demos/interpretability/gat/node-link-importance-demo-gat.ipynb

Jupyter notebook does not have title.
I have added a title - Interpreting Nodes and Edges by Saliency Maps in GAT

When you introduce importances and saliency maps you don’t seem to describe the assumptions of the functions. I believe the integrated gradient methods assume that the features are all binary. We should state this assumption clearly in the description. Also indicate what we should do if the features are not binary.

IG does seem to work well for binary features compared with vanilla methods. However, it does not assume binary features. In fact, it was initially used in the image domain where features are not binary.

Sanity checks and others are best moved from notebooks to unit tests:

delta & non_exist_edge variable check

serialisation check

ego-graph integrate_link_mask check

masked_array check

Fixed. These sanity checks are now in the unit tests.

Let’s do the same here as we discussed for GCN - I think the functions should all take a node ID not an index so we don’t confuse the different indices for graph vs pandas features.

How about changing the name of get_integrated_link_masks to get_link_importance to match the get_node_importance function.
Fixed.

As you have used sorted_indices[::-1] in the preceding cell, in the line:
print('Top {} most important links by integrated gradients are {}'.format(topk, integrated_link_importance_rank[-topk:]))
Shouldn’t integrated_link_importance_rank[-topk:] be integrated_link_importance_rank[:topk]?

Good catch! Fixed.

The nodes in G_ego are by IDs, therefore I think we should use target_nid not target_idx here:
  if nid == target_nid:
    continue
In the visualisation code and elsewhere there are many instances of list(G.nodes()).index(·) that can be replaced with graph_nodes.index(·)

Fixed.

layer/graph_attention.py

I think we need to subtract the row maximum from dense before the exponential to avoid floating point errors in the exp function:
W = (
    (1 - self.non_exist_edge) * self.delta * A
    + self.non_exist_edge
    * (
        A
        + self.delta * (K.ones(shape=[N, N], dtype=*”float”*) - A)
        + K.eye(N)
    )
) * K.exp(dense - K.max(dense, axis=1, keepdims=True))
dense = W / K.sum(W, axis=1, keepdims=True)
This means the current tests fail, but I don’t believe this is bad – the results from this are different from the implementation without the subtraction. The results in the notebook are the same with this normalisation.

There is, as you point out, an issue with the number of non-zero elements in the link importance calculation; however, I think this is just an issue of floating point accuracy, and counting elements above a small threshold works fine: i.e. using the following
print("Number of non-zero elements in integrate_link_importance: {}".format(np.sum(np.abs(integrate_link_importance) > 1e-8)))
gives:
Number of edges in the ego graph: 210
Number of non-zero elements in integrate_link_importance: 210

That's interesting. It does explain the previous test failure. Fixed in the tests.

I would like to add a unit test to the tests/layer/test_graph_attention.py file that checks that the results from the implementation with saliency_map_support=False are the same as those for saliency_map_support=True – such as the test_apply_average_with_neighbours method. Would you like to add this test?

I have changed the test to add the GAT model with saliency map support as well.

utils/saliency_maps_gat

I think we should move these files to utils/saliency_maps and name them integrated_gradients_gat.py and saliency_gat.py . Additionally rename the IntegratedGradients class to IntegratedGradientsGAT and GradientSaliency to GradientSaliencyGAT. This way we can import all saliency objects to the same namespace in utils/saliency_maps/__init__.py.

Fixed.

As in the comments on the notebook above, let’s use the node IDs instead of index in all functions.

Yes, fixed.

In get_integrated_node_masks the features are taken from zero to one.

It's actually from baseline (does not necessarily be 0) to the current state of X (which does not necessarily be 1).

This seems to assume that the features are all binary. What happens if they are not? We should state this assumption clearly in the class documentation.

Therefore, we do not assume binary features.

This seems to only consider the importance of features that are one, as any features that are zero will be zero for all steps. I would have guessed that there would also be a function that takes those features from 1 to 0, you talked about this in the paper but I forget what you said now!

Somehow I did not implement that in GAT. Fixed now.

There should be a description of what each method does and the assumptions made.

Fixed.

In the argument list try not to put multiple variables on the same line, rather describe each variable separately on its own line.

Fixed.

…uivalent.

adocherty

This seems to only consider the importance of features that are one, as any features that are zero will be zero for all steps. I would have guessed that there would also be a function that takes those features from 1 to 0, you talked about this in the paper but I forget what you said now!

Somehow I did not implement that in GAT. Fixed now.

You have introduced a flag which selects if the we should set the baseline to zero and go to X_val for all features, or set the baseline to X_val and go to one for all features. I was thinking that, for binary features at least, we should set the baseline to 1-X_val then the IG can calculate the change from 1 to 0 or 0 to 1, i.e. what will happen when the feature is different. Thus when we calculate node importance, we don't have to do so twice (once for features which are 1 and then again for features which are 0). What do you think?

adocherty · 2019-08-02T23:48:25Z

stellargraph/utils/saliency_maps/saliency_gat.py

        Returns:
            gradients (Numpy array): Returns a vanilla gradient mask for the nodes.
        """
-        out_indices = np.array([[node_idx]])
+        out_indices = np.array([[node_id]])


I was thinking we would look up the node index given the node IDs here? For example, the FullBatchNodeGenerator does this using the graph node_list :

node_indices = np.array([self.node_list.index(n) for n in node_ids])

We should do something similar here.

We should also do this for the original GCN saliency methods. I've added an issue for this: #466

adocherty · 2019-08-02T23:53:00Z

stellargraph/utils/saliency_maps/integrated_gradients_gat.py

        X_diff = X_val - X_baseline
        total_gradients = np.zeros(X_val.shape)

        for alpha in np.linspace(1.0 / steps, 1, steps):
            X_step = X_baseline + alpha * X_diff
-            total_gradients += super(IntegratedGradients, self).get_node_masks(
-                node_idx, class_of_interest, X_val=X_step
+            total_gradients += super(IntegratedGradientsGAT, self).get_node_masks(


This can be:

total_gradients += super().get_node_masks( node_id, class_of_interest, X_val=X_step)

adocherty · 2019-08-02T23:53:27Z

stellargraph/utils/saliency_maps/integrated_gradients_gat.py

@@ -94,8 +106,8 @@ def get_integrated_link_masks(
        for alpha in np.linspace(1.0 / steps, 1.0, steps):
            if self.is_sparse:
                A_val = sp.lil_matrix(A_val)
-            tmp = super(IntegratedGradients, self).get_link_masks(
-                alpha, node_idx, class_of_interest, int(non_exist_edge)
+            tmp = super(IntegratedGradientsGAT, self).get_link_masks(


We can just have super() here.

adocherty · 2019-08-03T00:33:29Z

IG does seem to work well for binary features compared with vanilla methods. However, it does not assume binary features. In fact, it was initially used in the image domain where features are not binary.

OK, thanks for clarifying! We can mention this in the class docstring, particularly the importance of setting X_baseline appropriately.

sktzwhj · 2019-08-06T01:57:52Z

You have introduced a flag which selects if the we should set the baseline to zero and go to X_val for all features, or set the baseline to X_val and go to one for all features. I was thinking that, for binary features at least, we should set the baseline to 1-X_val then the IG can calculate the change from 1 to 0 or 0 to 1, i.e. what will happen when the feature is different. Thus when we calculate node importance, we don't have to do so twice (once for features which are 1 and then again for features which are 0). What do you think?

I tend to keep as is. Although we aimed at binary features as the motivation initially, we should not make the implementation to be specifically for that. If the features are not binary, the path from 0 -> X_val and X_val -> 1 are different, thus leading to no space for the optimization described above. Also, under the existing frameworks, calculating the gradients for part of the matrix does not really bring much performance improvements I think.

sktzwhj · 2019-08-06T07:12:04Z

We need to think about how to return the link importance values to align with the node id rather than node indices in the parameters of saliency maps. In other words, we should not let the users do the manual re-mapping by themselves.

adocherty · 2019-08-08T01:48:25Z

We need to think about how to return the link importance values to align with the node id rather than node indices in the parameters of saliency maps. In other words, we should not let the users do the manual re-mapping by themselves.

That is a good point, I didn't think about that. I think we should add this as a future issue.

I'm happy with the changes!

sktzwhj added 3 commits June 28, 2019 10:24

Add saliency map support for GAT (new generator API supported).

51fac11

formatting.

fc1642b

Add unit test.

10a96ed

sktzwhj added the ml label Jun 28, 2019

sktzwhj added this to the v0.4 Sprint 7 milestone Jun 28, 2019

sktzwhj requested review from adocherty and youph June 28, 2019 04:04

sktzwhj self-assigned this Jun 28, 2019

codeclimate bot reviewed Jun 28, 2019

View reviewed changes

sktzwhj added 5 commits June 28, 2019 14:48

add abs=1e-10 for link importance unit test.

27d8f83

remove unnecessary code in test.

b98c7e4

black formatting.

173b518

update test for the new generator.

afa13ab

black formatting.

1dbe778

sktzwhj closed this Jul 1, 2019

sktzwhj reopened this Jul 1, 2019

adocherty suggested changes Jul 25, 2019

View reviewed changes

PantelisElinas modified the milestones: v0.4 Sprint 7, v0.4 Sprint 10 Jul 29, 2019

address comments.

82570fc

sktzwhj added 7 commits July 31, 2019 16:50

merge namespace for gcn and gat saliency maps.

510e6b5

minor fix.

d4c2b72

Update parameter list and comments.

4b244a0

formatting.

d683977

formatting

497929e

More descriptions for functions.

b7be422

formatting.

d7cbcea

Add test to ensure the gat model with/without saliency support are eq…

f2f1728

…uivalent.

adocherty reviewed Aug 3, 2019

View reviewed changes

sktzwhj added 2 commits August 6, 2019 16:56

replace node indices by node ids in the parameter list of saliency maps.

90e82e8

minor comment edit

71cacb8

sktzwhj assigned adocherty and youph Aug 6, 2019

adocherty approved these changes Aug 8, 2019

View reviewed changes

sktzwhj merged commit 3463008 into develop Aug 8, 2019

youph deleted the gat-saliency-map branch January 2, 2020 00:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saliency Map for GAT #435

Saliency Map for GAT #435

sktzwhj commented Jun 28, 2019

review-notebook-app bot commented Jun 28, 2019

codeclimate bot Jun 28, 2019

codeclimate bot Jun 28, 2019

codeclimate bot Jun 28, 2019

codeclimate bot Jun 28, 2019

codeclimate bot Jun 28, 2019

codeclimate bot commented Jun 28, 2019 •

edited

adocherty left a comment •

edited

sktzwhj commented Jul 31, 2019 •

edited

Notebook

layer/graph_attention.py

utils/saliency_maps_gat

adocherty left a comment

adocherty Aug 2, 2019

adocherty Aug 2, 2019

adocherty Aug 2, 2019

sktzwhj Aug 6, 2019

adocherty Aug 2, 2019

sktzwhj Aug 6, 2019

adocherty commented Aug 3, 2019

sktzwhj commented Aug 6, 2019 •

edited

sktzwhj commented Aug 6, 2019

adocherty commented Aug 8, 2019

Saliency Map for GAT #435

Saliency Map for GAT #435

Conversation

sktzwhj commented Jun 28, 2019

review-notebook-app bot commented Jun 28, 2019

codeclimate bot Jun 28, 2019

Choose a reason for hiding this comment

codeclimate bot Jun 28, 2019

Choose a reason for hiding this comment

codeclimate bot Jun 28, 2019

Choose a reason for hiding this comment

codeclimate bot Jun 28, 2019

Choose a reason for hiding this comment

codeclimate bot Jun 28, 2019

Choose a reason for hiding this comment

codeclimate bot commented Jun 28, 2019 • edited

adocherty left a comment • edited

Choose a reason for hiding this comment

Notebook

layer/graph_attention.py

utils/saliency_maps_gat

sktzwhj commented Jul 31, 2019 • edited

Notebook

layer/graph_attention.py

utils/saliency_maps_gat

adocherty left a comment

Choose a reason for hiding this comment

adocherty Aug 2, 2019

Choose a reason for hiding this comment

adocherty Aug 2, 2019

Choose a reason for hiding this comment

adocherty Aug 2, 2019

Choose a reason for hiding this comment

sktzwhj Aug 6, 2019

Choose a reason for hiding this comment

adocherty Aug 2, 2019

Choose a reason for hiding this comment

sktzwhj Aug 6, 2019

Choose a reason for hiding this comment

adocherty commented Aug 3, 2019

sktzwhj commented Aug 6, 2019 • edited

sktzwhj commented Aug 6, 2019

adocherty commented Aug 8, 2019

codeclimate bot commented Jun 28, 2019 •

edited

adocherty left a comment •

edited

sktzwhj commented Jul 31, 2019 •

edited

sktzwhj commented Aug 6, 2019 •

edited